data analysis

1. Description

data_analysis is a command-line tool for performing data quality checks, calculations, search operations, transformations, and file operations on text-based data files. It runs independently from any directory with no external dependencies.

Typical run flow: Choose run mode → Choose file → Enter header/footer rows → Choose category → Choose activity → Enter inputs → Confirm → Run → Review result

2. Starting the Tool

starting_the_tool

The data_analysis tool can be run from any directory without any dependency. The username is expected for logging and querying purposes.

3. Run Mode Selection

run_mode_selection

Select how much system resources the tool should use during processing. Choose based on how fast you need results versus how much load you want on the server.

#	Option	Description
1	Rabbit Mode	Uses maximum resources available in the system. Best when you want faster processing.
2	Tortoise Mode	Uses minimum basic required resources. Best when you want lower system usage.

4. File Selection

file_selection

Choose the input file you want to work on. You can type the path directly or use the file picker suggestions. There is no limitation to absolute or relative path.

5. Delimiter Detection

delimiter_detection

The delimiter of the txt file is automatically detected. If auto-detection fails, you will be prompted to manually select the delimiter from the options shown below.

delimiter_manual

6. Header and Footer Counts

header_footer_counts

Tell the tool how many rows at the top and bottom of the file are not actual data. This ensures those rows are excluded from all calculations and operations.

#	Option	Description
1	Header rows	Rows at the top of the file that should not be treated as normal data.
2	Footer rows	Rows at the bottom of the file that should not be treated as normal data.

Example: If the first row contains column names and the last 2 rows contain summary text, enter 1, 2. If there are no header and no footer rows, enter 0, 0 or just press Enter.

main_menu

For each menu category, all available sub-options are displayed. The menu is categorized based on similar operations that can be applied to the file.

#	Option	Description
1	Up / Down	Navigate between options or select an option.
2	Enter / ALT + Enter	Confirm column selection or file selection.
3	Enter	Confirm multi-column selection.
4	Esc	Move one step backwards.
5	Backspace / Delete	Clear or remove input.

7.2 Writing Output

Many calculations or validations can be written to an output file by selecting the ✎ Write to a file option and selecting a file path.

7.3 Column Selection Behaviour

If a header is passed and is correct, columns can be fetched via dropdown.

column_selection_dropdown

If a header is passed but data is wrong, or if no header is passed, column numbers are expected from the user.

column_selection_number

8. Data Quality

8.1 Column Validation

column_validation_menu

column_validation_result

Use this to check whether values in selected columns follow expected rules. Each column can be validated against one or more rule types, and any failing rows are reported.

Available validation types:

date
format
allowed values list
exact length
maximum length
minimum length
must not be empty
must be empty
valid number
sign check
numeric range
regex pattern
no extra whitespace

8.2 Column Count Check

column_count_check

Use this to check whether every row in the file has the expected number of columns. Useful for catching malformed rows where delimiters are missing or extra.

#	Option	Description
1	Exact	Row must have exactly the expected number of columns.
2	Less than	Flags rows that have fewer columns than expected.
3	More than	Flags rows that have more columns than expected.

column_count_result

8.3 Duplicate

duplicate

Use this to detect duplicate rows based on one or more key columns. Provides four options depending on whether you want to view, count, remove, or extract the duplicates.

#	Option	Description
1	Find duplicates	Displays all rows that are duplicated based on the selected key columns.
2	Count duplicates	Returns the total count of duplicate rows found in the file.
3	Remove duplicates	Removes all duplicate rows and writes the distinct records to an output file, keeping the first occurrence of each.
4	Write to a file	Writes only the duplicate records to a separate output file for review.

duplicate_result

8.4 File Profile

file_profile

Use this to get a column-level summary of the entire input file. Helps you understand the shape and quality of your data before running any operations on it.

Information shown per column:

Blanks
Min
Max
Average
Distinct values
Detected type

8.5 Cross-Column Comparison

cross_column_comparison

Use this to compare values across two numeric columns within the same file. Useful for validating relationships between columns, such as ensuring one value is always greater than another.

Operators available:

Greater than
Less than
Equal to
Greater than or equal to
Less than or equal to

9. Calculation

9.1 Sum

sum

Use this to total one or more numeric columns across all rows or a filtered subset. You can narrow down which rows to include using column-value filters before summing.

#	Option	Description
1	Optional filter	Choose column(s) and values to include only matching rows. Conditions: equal to, not equal to.
2	Sign options	Control which values to include: all values, positive only, negative only, or absolute values.

9.2 Group By

group_by

Use this to group rows by one or more columns and calculate aggregated results per group. The output can be reviewed on screen or written to a file.

Available calculations per group:

Sum
Average
Weighted average
Count
Maximum
Minimum
Count distinct

9.3 Reconciliation

reconciliation

Use this to reconcile data between two files. Select a recon type to compare either the total sum of a column or the total row count across File A and File B.

#	Option	Description
1	Sum	Compares the total of selected columns between File A and File B and reports any difference.
2	Row count	Compares the total number of rows in File A versus File B.

10. Search & Filter

10.1 Find / Search

find_search

Use this to find rows based on a value in a selected column. Choose a match mode to control how the search is applied, then view results on screen or write them to a file.

#	Option	Description
1	Exact match	Row must match the search value exactly.
2	Contains	Row contains the search value anywhere within it.
3	Starts with	Row value begins with the search string.
4	Ends with	Row value ends with the search string.
5	Regex	Row matches the provided regular expression pattern.
6	Does not match	Row does not match the provided value.

10.2 Fetch by Date Range

fetch_by_date_range

Use this to extract rows where a date column falls within a required range. You must provide the date format used in the file so the tool can parse it correctly.

#	Option	Description
1	Between	Returns rows where the date falls between two specified dates (inclusive).
2	Greater than	Returns rows where the date is after the specified date.
3	Less than	Returns rows where the date is before the specified date.

11. Transformation

11.1 Add Column

add_column

Use this to add a new column to the file. Choose the type of value the new column should contain, and the tool appends it to every row in the output.

#	Option	Description
1	Constant value	Every row gets the same fixed value in the new column.
2	Formula / expression	The new column value is computed from an expression using existing columns.
3	Row number	The new column contains the sequential row index.
4	Current date	The new column is populated with today's date.

11.2 Find & Replace

find_and_replace

Use this to replace values in selected columns or across all columns. Choose a match mode to control how values are matched before replacement is applied.

Match modes:

Exact — replaces only rows where the value matches exactly.
Contains — replaces in any row where the value appears within the cell.
Regex — replaces values matching the provided regular expression.

11.3 Remove Rows

remove_rows

Use this to remove unwanted rows from the file before further processing. Select the mode that matches which rows you want to drop.

#	Option	Description
1	Header	Removes the first row of the file.
2	Footer	Removes the last row of the file.
3	First N rows	Removes the first N rows as specified by the user.
4	Row range	Removes rows within a specified start and end range.
5	Blank rows	Removes all rows that are completely empty.

11.4 Sort

sort

Use this to sort the file by one or more columns. For each selected column, choose ascending or descending order. Multi-column sort applies the order in the sequence you select them.

11.5 Trim / Normalize

trim_normalize

Use this to strip leading and trailing spaces from values in the file. Can be applied to specific columns or to all columns at once to clean up inconsistent spacing.

11.6 Column Select

column_select

Use this to keep only the columns you need in the output file. Columns are written in the exact order you select them, allowing you to reorder as well as filter.

12. File Operations

12.1 Merge

merge

Use this to combine two files into one. Provide key columns from both files for join-based merges. The merged result is written directly to an output file.

#	Option	Description
1	Inner join	Keeps only rows where the key exists in both File A and File B.
2	Left join	Keeps all rows from File A; matches from File B where available.
3	Right join	Keeps all rows from File B; matches from File A where available.
4	Full outer join	Keeps all rows from both files, matched where keys align.
5	Append (vertical stack)	Stacks File B rows directly below File A rows without any key matching.

12.2 Lookup

lookup

Use this to enrich the main file by appending columns from a reference file. Match is done on key columns from both files, and the selected reference value columns are appended to each matched row.

12.3 Unique

unique

Use this to extract unique values based on selected columns. Useful for quickly seeing what distinct values exist in a column and optionally how many times each appears.

#	Option	Description
1	Values only	Lists distinct values without occurrence counts.
2	Values with counts	Lists distinct values along with how many times each appears in the file.

12.4 Split

split

Use this to break one file into multiple output files. When splitting by column value, a separate file is created for each unique value in that column, named as output_filename_columnvalue.txt.

#	Option	Description
1	Split by column value	Creates one output file per unique value in the selected column. Output files are named as outputfilename\<column_value>.txt.
2	Split by row count	Splits the file into chunks where each output file contains N rows as specified by the user.

12.5 File Diff

file_diff

Use this to compare two files and identify what is different between them. The key column(s) must exist in both files and should represent the same kind of record, such as an ID, email, or order number.

#	Option	Description
1	Key only	Checks which records exist in one file but are missing in the other, based on the key column(s) alone. Does not compare values.
2	Key + value comparison	Matches records by key column(s), then compares the value column(s) to find rows where the key exists in both files but the data differs.

data analysis

1. Description

2. Starting the Tool

3. Run Mode Selection

4. File Selection

5. Delimiter Detection

6. Header and Footer Counts

7. Main Menu

7.1 Navigation Keys

7.2 Writing Output

7.3 Column Selection Behaviour

8. Data Quality

8.1 Column Validation

8.2 Column Count Check

8.3 Duplicate

8.4 File Profile

8.5 Cross-Column Comparison

9. Calculation

9.1 Sum

9.2 Group By

9.3 Reconciliation

10. Search & Filter

10.1 Find / Search

10.2 Fetch by Date Range

11. Transformation

11.1 Add Column

11.2 Find & Replace

11.3 Remove Rows

11.4 Sort

11.5 Trim / Normalize

11.6 Column Select

12. File Operations

12.1 Merge

12.2 Lookup

12.3 Unique

12.4 Split

12.5 File Diff

1. Description​

2. Starting the Tool​

3. Run Mode Selection​

4. File Selection​

5. Delimiter Detection​

6. Header and Footer Counts​

7. Main Menu​

7.1 Navigation Keys​

7.2 Writing Output​

7.3 Column Selection Behaviour​

8. Data Quality​

8.1 Column Validation​

8.2 Column Count Check​

8.3 Duplicate​

8.4 File Profile​

8.5 Cross-Column Comparison​

9. Calculation​

9.1 Sum​

9.2 Group By​

9.3 Reconciliation​

10. Search & Filter​

10.1 Find / Search​

10.2 Fetch by Date Range​

11. Transformation​

11.1 Add Column​

11.2 Find & Replace​

11.3 Remove Rows​

11.4 Sort​

11.5 Trim / Normalize​

11.6 Column Select​

12. File Operations​

12.1 Merge​

12.2 Lookup​

12.3 Unique​

12.4 Split​

12.5 File Diff​

1. Description

2. Starting the Tool

3. Run Mode Selection

4. File Selection

5. Delimiter Detection

6. Header and Footer Counts

7. Main Menu

7.1 Navigation Keys

7.2 Writing Output

7.3 Column Selection Behaviour

8. Data Quality

8.1 Column Validation

8.2 Column Count Check

8.3 Duplicate

8.4 File Profile

8.5 Cross-Column Comparison

9. Calculation

9.1 Sum

9.2 Group By

9.3 Reconciliation

10. Search & Filter

10.1 Find / Search

10.2 Fetch by Date Range

11. Transformation

11.1 Add Column

11.2 Find & Replace

11.3 Remove Rows

11.4 Sort

11.5 Trim / Normalize

11.6 Column Select

12. File Operations

12.1 Merge

12.2 Lookup

12.3 Unique

12.4 Split

12.5 File Diff